244. Shortest Word Distance II
Medium

Design a data structure that will be initialized with a string array, and then it should answer queries of the shortest distance between two different strings from the array.

Implement the WordDistance class:

  • WordDistance(String[] wordsDict) initializes the object with the strings array wordsDict.
  • int shortest(String word1, String word2) returns the shortest distance between word1 and word2 in the array wordsDict.

 

Example 1:

Input
["WordDistance", "shortest", "shortest"]
[[["practice", "makes", "perfect", "coding", "makes"]], ["coding", "practice"], ["makes", "coding"]]
Output
[null, 3, 1]

Explanation
WordDistance wordDistance = new WordDistance(["practice", "makes", "perfect", "coding", "makes"]);
wordDistance.shortest("coding", "practice"); // return 3
wordDistance.shortest("makes", "coding");    // return 1

 

Constraints:

  • 1 <= wordsDict.length <= 3 * 104
  • 1 <= wordsDict[i].length <= 10
  • wordsDict[i] consists of lowercase English letters.
  • word1 and word2 are in wordsDict.
  • word1 != word2
  • At most 5000 calls will be made to shortest.
Accepted
90,538
Submissions
162,973
Seen this question in a real interview before?
Companies
Related Topics

Average Rating: 4.36 (39 votes)

Solution

Before looking at the solution for this problem, let's look at what the problem asks us to do in simpler terms. We have to design a class which receives a list of words as input in the constructor. The class has a function which we need to implement and that function is shortest which takes two words as input and returns the minimum distance between the two as the output.

When the problem talks about the distance between two words, it essentially means the absolute gap between the indices of the two words in the list. For e.g. if the first word occurs at a location i and the second word occurs at the location j, then the distance between the two would be abs(i - j).

The question asks us to find the minimum such different between words which clearly indicates that the words can occur at multiple locations. If we have K occurrences for the word1 and L occurrences for the word2, then iteratively checking every pair of indices will give us a O(N2)O(N^2) algorithm which won't be optimal at all. We won't discuss that algorithm here since it is very straightforward.

The brute-force algorithm would simple consider all possible pairs of indices for (word1_location, word2_location) and see which one produces the minimum distance. Let's try and build on this idea and see if some pre-processing can help us out reduce the complexity of the brute-force algorithm.


Approach 1: Using Preprocessed Sorted Indices

Intuition

A given word can occur multiple times in the original word list. Let's suppose the first word, word1 in the input to the function shortest occurs at the indices [i1, i2, i3, i4] in the original list. Similarly, let's assume that the second word, word2, appears at the following locations inside the word list [j1, j2, j3].

Now, given these list of indices, we are to simply find the pair of indices (i, j) such that their absolute difference is minimum.

The main idea for this approach is that if the list of these indices is in sorted order, we can find such a pair in linear time.

The idea is to use a two pointer approach. Let's say we have a pointer i for the sorted list of indices of word1 and j for the sorted list of indices of word2. At every iteration, we record the difference of indices i.e. abs(word1[i] - word2[j]). Once we've done that, we have two possible choices for progressing the two pointers.

word1[i] < word2[j]

If this is the case, that means there is no point in moving the j pointer forward. The location indices for the words are in a sorted order. We know that word2[j + 1] > word2[j] because these indices are sorted. So, if we move j forward, then the difference abs(word1[i] - word2[j + 1]) would be even greater than abs(word1[i] - word2[j]). That doesn't help us since we want to find the minimum possible distance (difference) overall.

So, if we have (word1[i] < word2[j]), we move the pointer 'i' one step forward i.e. (i + 1) in the hopes that abs(word1[i + 1] - word2[j]) would give us a lower distance than abs(word1[i] - word2[j]). We say "hopes" because it is not certain this improvement would happen.

Let's look at two different examples. In the first example we will see that moving i forward gave us the best difference overall (0). In the second example we see that moving i forward leads us to our second case (yet to discuss) but doesn't lead to any improvement in the difference.

Example-1

word1_locations = [2,4,5,9]
word2_locations = [4,10,11]

i, j = 0, 0
min_diff = 2 (abs(2 - 4))
word1[i] < word2[j] i.e. 2 < 4
  move i one step forward

i, j = 1, 0 (abs(4 - 4))
min_diff = 0 (We hit the jackpot!)  

Example-2

word1_locations = [2,7,15,16]
word2_locations = [4,10,11]

i, j = 0, 0
min_diff = 2 (abs(2 - 4))
word1[i] < word2[j] i.e. 2 < 4
  move i one step forward

i, j = 1, 0
min_diff = 2 (2 < abs(7 - 4))

Here, we did not update out global minimum difference.
That is why we said earlier, moving 'i' forward may or
may not give a lower difference. But moving 'j' forward in
our case would definitely worsen the difference (or keep it same!).

Let's move onto our second scenario.

word1[i] > word2[j]

If this is the case, that means there is no point in moving the i pointer forward. We know that word1[i + 1] > word2[j] because these indices are sorted. So, if we move i forward, then the difference abs(word1[i + 1] - word2[j]) would be even greater than abs(word1[i] - word2[j]). That doesn't help us since we want to find the minimum possible distance (difference) overall.

So, along the similar lines of thought as the previous case, if we have (word1[i] > word2[j]), we move the pointer 'j' one step forward i.e. (j + 1) in the hopes that abs(word1[i] - word2[j + 1]) would give us a lower distance than abs(word1[i] - word2[j]). We say "hopes" because as showcased in the previous scenario, it is not certain this improvement would happen.

Now let's formally look at the algorithm for solving this problem.

Algorithm

  1. In the constructor of the class, we simply iterate over the given list of words and prepare a dictionary, mapping a word to all it's locations in the array.
  2. Since we process all the words from left to right, we will get all the indices in a sorted order by default for all the words. So, we don't have to sort the indices ourselves.
  3. Let's call the dictionary that we build, locations.
  4. For a given pair of words, obtain the list of indices (appearances inside the original list/array of words). Let's call the two arrays loc1 and loc2.
  5. Initialize two pointer variables l1 = 0 and l2 = 0.
  6. For a given l1 and l2, we first update (if possible) the minimum difference (distance) till now i.e. dist = min(dist, abs(loc1[l1] - loc2[l2])). Then, we check if loc1[l1] < loc2[l2] and if this is the case, we move l1 one step forward i.e. l1 = l1 + 1. Otherwise, we move l2 one step forward i.e. l2 = l2 + 1.
  7. We keep doing this until all the elements in the smaller of the two location arrays are processed.
  8. Return the global minimum distance between the words.

This represents the locations dictionary that we should build given the original words list in the constructor. The key represents the word and the value is a list containing indices in ascending order of occurrences throughout the array. Let's look at the minimum distance between the words apple and football in the array. So, we will be considering the two sorted lists of indices: [3, 6, 8, 12] and [2, 7, 9].

Current
1 / 14

Complexity analysis

  • Time complexity : The time complexity of the constructor of our class is O(N)O(N) considering there were NN words in the original list. We iterate over them and prepare a mapping from key to list of indices as described before. Then, for the function that finds the minimum distance between the two words, the complexity would be O(max(K,L))O(max(K, L)) where KK and LL represent the number of occurrences of the two words. However, K=O(N)K = O(N) and also L=O(N)L = O(N). Therefore, the overall time complexity would also be O(N)O(N). The reason the complexity is O(max(K,L))O(max(K, L)) and not O(min(K,L))O(min(K, L)) is because of the scenario where the minimum element of the smaller list is larger than all the elements of the larger list. In that scenario, the pointer for the smaller list will not progress at all and the one for the longer list will reach to the very end.
  • Space complexity: O(N)O(N) for the dictionary that we prepare in the constructor. The keys represent all the unique words in the input and the values represent all of the indices from 0...N0 ... N.

Report Article Issue

Comments: 19
yijiang0923's avatar
Read More

thanks for your article.
i believe the function that finds the minimum distance between the 2 words should be O(k + l).
consider the following case, K = L:
word1 = [1 3 5 7 9]
word2 = [2 4 6 8 10]
we will iterate all word1 and word2 based on ur logic. thus O(k + l)

33
Show 3 replies
Reply
Share
Report
kai99's avatar
Read More

just a slight mistake in the article, in the first example, word1 and word 2 seem to happen at the same index 4 which can never happen according to the problem statement

24
Reply
Share
Report
yyfyifan's avatar
Read More

This method makes sense, but why can't we just repeat the same one in Shortest Word Distance I. We just store the whole words list in the constructor, and do the linear time method to get the min distance in 'shortest' function. The overall space complexity and time complexity are both O(N), which is same as the answer above.

Can anyone explain it to me? I really want to know.

12
Show 3 replies
Reply
Share
Report
code__monkey's avatar
Read More

poorly stated problem. The problem definition is highly incomplete.

28
Show 2 replies
Reply
Share
Report
jizhouta's avatar
Read More

The description "Your method will be called repeatedly many times with different parameters." gives me an impression that it's looking for a time complexity of O(1) calling shortest() lol.

4
Reply
Share
Report
Yinglao's avatar
Read More

I think having a memo map to record calculated results could be practically temporally more efficient though this would lead to O(N2) spacial complexity.

3
Show 1 reply
Reply
Share
Report
zhang16's avatar
Read More

I used binary search with iteration for querying minimal distance, whose time complexities is
O(min(K,L)logmax(K,L)). I think it will be better in cases when L >> K.

2
Show 1 reply
Reply
Share
Report
vreshetnikov's avatar
Read More

To find the leftmost occurrence of the other word to the right of the current position of the current word we can use a binary search (upper_bound) instead of a linear search. Here is a C++ implementation that runs in 36ms (faster than 100% of other C++ submissions):

class WordDistance final {
    unordered_map<string, vector<int>> positions;

public:
    WordDistance(const vector<string>& words) noexcept {
        for (int i = 0; i < words.size(); ++i) positions[words[i]].push_back(i);
    }

    int shortest(const string& word1, const string& word2) const noexcept {
        auto& vector_1 = positions.find(word1)->second;
        auto& vector_2 = positions.find(word2)->second;

        auto current_1 = vector_1.cbegin();
        auto current_2 = vector_2.cbegin();

        auto end_1 = vector_1.cend();
        auto end_2 = vector_2.cend();

        if (*current_1 > *current_2) swap(current_1, current_2), swap(end_1, end_2);
        int distance = *current_2 - *current_1;

        while (true) {
            current_1 = upper_bound(current_1 + 1, end_1, *current_2);
            distance = min(distance, *current_2 - current_1[-1]);
            if (current_1 == end_1) return distance;
            distance = min(distance, *current_1 - *current_2);
            swap(current_1, current_2), swap(end_1, end_2);
        }
    }
};

#define endl '\n'
static const auto speedup = 
    []() { ios::sync_with_stdio(0); cin.tie(0); cout.tie(0); return 0; }();

2
Reply
Share
Report
huxuzi's avatar
Read More

The code is correct, but the comment is wrong. "# Until the shorter of the two lists is processed", not necessarily. The longer list could be consumed first.

0
Reply
Share
Report
psonlinux's avatar
Read More

Nice and intuitive article !!!

0
Reply
Share
Report
Success
Details
Runtime: 32 ms, faster than 83.50% of C++ online submissions for Shortest Word Distance II.
Memory Usage: 19.6 MB, less than 46.57% of C++ online submissions for Shortest Word Distance II.
Next challenges:
Time Submitted
Status
Runtime
Memory
Language
06/19/2021 08:58Accepted32 ms19.6 MBcpp
06/19/2021 08:53Time Limit ExceededN/AN/Acpp
/23
Contribute
|||
Saved
Trie
This list is empty.